Clean data for Abundance - size spectra and beavers
Load packages
Introduction
I am doing this notebook to help me and collaborators to understand which data we have and what is the clean / wrangle actions we should do before fitting the individual size distibution (ISD) model proposed by (Wesner et al. 2024). We have sampled the arthropod community of Switzerland streams. We used three (3) different methods, catching different insect families and natural history. The three methods from now on we defined as: emergent_trap, kick_net, suction.
Please remember the main questions of the research project:
Main questions:
- Does beaver presence alter the arthropods ISD ?
- To what extend does anthropogenic land area mediate beaver influence on arthropods ISD ?
Emergent arthropods
We are going to load the raw data collected by Ph.D. student Valentin Moser and research assistants during June 2021 in eight different beaver dammed streams (Figure 1) within the WSL-Eawag project: Species interactions in beaver engineered habitats link land-water ecosystem processes. They have sampled in two locations within each site: the main pond created by the beaver (i.e. pool factor level within the column location) and 500 meters upstream of that pond (i.e. control factor level within the column location).
In the folder data/raw you will find: data_arthropods_flying.xlsx
Emergent arthropods were collected by emergent traps (surface area 0.25 \(m^2\)) covered with 500 \(\mu m\) net. White collection containers installed on top of the pyramid-shaped trap facilitated the sampling of the arthropods with a self-made aspirator.
Load raw data
Code
# A tibble: 6 × 14
site location date class order suborder family subfamily juvenile
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <dbl>
1 Chrie Control June Arachnida Araneae <NA> <NA> NA 0
2 Chrie Control June Arachnida Araneae <NA> <NA> NA 0
3 Chrie Control June Arachnida Araneae <NA> <NA> NA 0
4 Chrie Control June Arachnida Araneae <NA> <NA> NA 0
5 Chrie Control June Insecta Coleoptera Adephaga Dytis… NA 0
6 Chrie Control June Insecta Coleoptera Phytophaga Curcu… NA 0
# ℹ 5 more variables: terrestrial <dbl>, juv.aquatic <dbl>, unusable <dbl>,
# length <dbl>, Laufnummer <dbl>
Metadata:
Code
data_dict <- data.frame(
Column_name = c(
"site", "location", "date", "class", "order", "suborder", "family",
"subfamily", "juvenile", "terrestrial", "juv aquatic", "unusable",
"length", "remarks"
),
Explanation = c(
"Study system in which the individual was sampled",
"Site at which individual was sampled (i.e. pool or control)",
"Collection period during which the individual was sampled (i.e. June or July)",
"Taxonomic class the sampled individual belongs to",
"Taxonomic order the sampled individual belongs to",
"Taxonomic suborder the sampled individual belongs to (where available)",
"Taxonomic family the sampled individual belongs to (where available)",
"Taxonomic subfamily the sampled individual belongs to (where available)",
"Binary value indicating if the sampled individual was a juvenile. 1 = juvenile, 0 = adult",
"Binary value indicating if the sampled individual was winged. 1 = non-winged, 0 = winged",
"Binary value indicating if the sampled individual belongs to a taxa with purely aquatic juveniles. 1 = aquatic, 0 = not aquatic",
"Binary value indicating if the sampled individual came from water (amphipoda, gastropoda)",
"Length of the sampled individual in mm (rounded to full numbers). Measured from head to end of abdomen, excluding appendages (wings, limbs, antennae, cerci, etc.)",
"Additional comments or information about a given individual"
)
)
kable(data_dict, align = c("l","l"), caption = "Metadata for data_arthropods_flying.xlsx")| Column_name | Explanation |
|---|---|
| site | Study system in which the individual was sampled |
| location | Site at which individual was sampled (i.e. pool or control) |
| date | Collection period during which the individual was sampled (i.e. June or July) |
| class | Taxonomic class the sampled individual belongs to |
| order | Taxonomic order the sampled individual belongs to |
| suborder | Taxonomic suborder the sampled individual belongs to (where available) |
| family | Taxonomic family the sampled individual belongs to (where available) |
| subfamily | Taxonomic subfamily the sampled individual belongs to (where available) |
| juvenile | Binary value indicating if the sampled individual was a juvenile. 1 = juvenile, 0 = adult |
| terrestrial | Binary value indicating if the sampled individual was winged. 1 = non-winged, 0 = winged |
| juv aquatic | Binary value indicating if the sampled individual belongs to a taxa with purely aquatic juveniles. 1 = aquatic, 0 = not aquatic |
| unusable | Binary value indicating if the sampled individual came from water (amphipoda, gastropoda) |
| length | Length of the sampled individual in mm (rounded to full numbers). Measured from head to end of abdomen, excluding appendages (wings, limbs, antennae, cerci, etc.) |
| remarks | Additional comments or information about a given individual |
Question:
How many taxa which ones are there?
Code
Number of unique families: 29
Code
cat("Families:\n")Families:
[1] "Aphidoidea" "Cantharidae" "Carabidae" "Chrysomelidae"
[5] "Coccinellidae" "Cucujidae" "Curculionidae" "Dytiscidae"
[9] "Elmidae" "Erebidae" "Forficulidae" "Formicidae"
[13] "Gerridae" "Gyrinidae" "Haliplidae" "Hydrophilidae"
[17] "Latridiidae" "Monotomidae" "Mordellidae" "Nitidulidae"
[21] "Notonectidae" "Panorpidae" "Phalacridae" "Psylloidea"
[25] "Scirtidae" "Staphylinidae" "Syrphidae" "Tabanidae"
[29] "Vespidae"
Description of the families we have found:
Code
# Families Found Near Streams in Switzerland
families_table <- data.frame(
Family = c(
"Aphidoidea", "Cantharidae", "Carabidae", "Chrysomelidae",
"Coccinellidae", "Cucujidae", "Curculionidae", "Dytiscidae",
"Elmidae", "Erebidae", "Forficulidae", "Formicidae",
"Gerridae", "Gyrinidae", "Haliplidae", "Hydrophilidae",
"Latridiidae", "Monotomidae", "Mordellidae", "Nitidulidae",
"Notonectidae", "Panorpidae", "Phalacridae", "Psylloidea",
"Scirtidae", "Staphylinidae", "Syrphidae", "Tabanidae",
"Vespidae"
),
Description = c(
"Aphids; plant sap-feeders, often found on riparian vegetation.",
"Soldier beetles; predatory or nectar-feeding, common in meadows near water.",
"Ground beetles; many species are predators along stream banks.",
"Leaf beetles; herbivores on riparian plants.",
"Lady beetles; mostly aphid predators on vegetation.",
"Flat bark beetles; live under bark, sometimes in moist riparian wood.",
"Weevils; herbivores feeding on riparian plants and shrubs.",
"Predaceous diving beetles; aquatic predators in streams and ponds.",
"Riffle beetles; aquatic, live attached to stones in running water.",
"Tiger moths and relatives; larvae feed on diverse plants near water.",
"Earwigs; omnivores hiding under stones and wood along streams.",
"Ants; common in soils and vegetation along riparian zones.",
"Water striders; aquatic predators skating on water surfaces.",
"Whirligig beetles; fast swimmers on water surfaces in streams.",
"Crawling water beetles; small herbivorous beetles in shallow water.",
"Water scavenger beetles; aquatic or semi-aquatic scavengers.",
"Minute brown scavenger beetles; found in decaying plant matter.",
"Root-eating beetles; often associated with decaying wood.",
"Tumbling flower beetles; found on flowers near riparian habitats.",
"Sap beetles; feed on decaying fruit, fungi, and plant material.",
"Backswimmers; aquatic predators that swim upside down.",
"Scorpionflies; scavengers, often in damp shaded stream habitats.",
"Shining flower beetles; small pollen feeders.",
"Psyllids; plant sap-feeders, often on riparian trees and shrubs.",
"Marsh beetles; aquatic or semi-aquatic beetles in wetlands.",
"Rove beetles; very diverse predators and scavengers in moist habitats.",
"Hoverflies; larvae are aphid predators, adults visit flowers.",
"Horse flies; adults feed on blood or nectar, larvae in wet soils.",
"Wasps; diverse group of predators and parasitoids near water."
)
)
kable(families_table, caption = "Ecological roles of arthropod families sampled close to streams in Switzerland")| Family | Description |
|---|---|
| Aphidoidea | Aphids; plant sap-feeders, often found on riparian vegetation. |
| Cantharidae | Soldier beetles; predatory or nectar-feeding, common in meadows near water. |
| Carabidae | Ground beetles; many species are predators along stream banks. |
| Chrysomelidae | Leaf beetles; herbivores on riparian plants. |
| Coccinellidae | Lady beetles; mostly aphid predators on vegetation. |
| Cucujidae | Flat bark beetles; live under bark, sometimes in moist riparian wood. |
| Curculionidae | Weevils; herbivores feeding on riparian plants and shrubs. |
| Dytiscidae | Predaceous diving beetles; aquatic predators in streams and ponds. |
| Elmidae | Riffle beetles; aquatic, live attached to stones in running water. |
| Erebidae | Tiger moths and relatives; larvae feed on diverse plants near water. |
| Forficulidae | Earwigs; omnivores hiding under stones and wood along streams. |
| Formicidae | Ants; common in soils and vegetation along riparian zones. |
| Gerridae | Water striders; aquatic predators skating on water surfaces. |
| Gyrinidae | Whirligig beetles; fast swimmers on water surfaces in streams. |
| Haliplidae | Crawling water beetles; small herbivorous beetles in shallow water. |
| Hydrophilidae | Water scavenger beetles; aquatic or semi-aquatic scavengers. |
| Latridiidae | Minute brown scavenger beetles; found in decaying plant matter. |
| Monotomidae | Root-eating beetles; often associated with decaying wood. |
| Mordellidae | Tumbling flower beetles; found on flowers near riparian habitats. |
| Nitidulidae | Sap beetles; feed on decaying fruit, fungi, and plant material. |
| Notonectidae | Backswimmers; aquatic predators that swim upside down. |
| Panorpidae | Scorpionflies; scavengers, often in damp shaded stream habitats. |
| Phalacridae | Shining flower beetles; small pollen feeders. |
| Psylloidea | Psyllids; plant sap-feeders, often on riparian trees and shrubs. |
| Scirtidae | Marsh beetles; aquatic or semi-aquatic beetles in wetlands. |
| Staphylinidae | Rove beetles; very diverse predators and scavengers in moist habitats. |
| Syrphidae | Hoverflies; larvae are aphid predators, adults visit flowers. |
| Tabanidae | Horse flies; adults feed on blood or nectar, larvae in wet soils. |
| Vespidae | Wasps; diverse group of predators and parasitoids near water. |
Plot data
Code
# Alternative: side-by-side histogram (use 'dodge')
ggplot(d, aes(x = length, fill = location)) +
geom_histogram(binwidth = 1, color = "black", alpha = 0.7, position = "dodge") +
labs(
title = "Histogram of individual sizes by sampled location",
x = "Size (mm)",
y = "Frequency",
fill = "Location"
) +
theme_bw(base_size = 14)Code
# with stat halfeye
ggplot(d, aes(y = location, x = length, fill = location)) +
stat_halfeye(position = "dodge",
adjust = 1, # smoothness of density
width = 0.6, # width of half-eye
justification = -0.1,
point_interval = mean_qi, # show mean & 95% interval
alpha = 0.7
) +
labs(
title = "Size Distributions by Location",
y = "Location",
x = "Size (mm)"
) +
theme_bw(base_size = 14)Macrozoobethos: kick net
The macrozoobenthos data we are going to use was collected by Ph.D student Valentin Moser , UZH Master student Dominic Tinner and Patrick Hofmann within the WSL-Eawag project: Species interactions in beaver engineered habitats link land-water ecosystem processes. They sampled 14 streams with beaver presence across Switzerland (Figure 3). The streams varied in surrounding landscape (open landscape or forest), beaver pond area, and stream ecomorphology, i.e., the structural stream characteristics.
In each stream, four distinct locations were sampled: the lotic-lentic transition upstream of the dam (inflow), the stagnant water behind the dam (pond), a lotic stream section 25 m downstream of the dam (outflow), and a reach without any influence of beaver engineering (control) Figure 4. The control location was located 500 metres upstream of the main dam at eight sites and 500 metres downstream at the other eight sites. This design assumed that the control location represented the stream habitats as they would have existed prior to the arrival of the beaver in the stream. This assures a comprehensive understanding of the beavers’ effect on the freshwater ecosystem by comparing sections directly affected by the dams (i.e. inflow, pond, outflow) with those in their initial state (control).
The aquatic invertebrate sampling was conducted in May/June of 2021 or 2022 with the kick-net method. During kick-net sampling, the collector kicks up the sediment in a 0.25 x 0.25 meter area of the stream for 20 seconds while a net is placed downstream of the sampling area to collect swirled-up invertebrates. Each sample location (e.g. inflow) was sampled four times consecutively to capture different microhabitats, with two samples in organic substrates (e.g., submerged macrophytes, roots) and two samples in non-organic substrates (e.g., gravel, sand). The content of the net was then transferred to plastic tubs, where aquatic invertebrates were picked out by hand for a duration of 15 minutes. After this time, the remaining content was washed in the stream (similar to gold panning), leaving only fine sediment with remaining individuals. This was stored together with the picked-out invertebrates in [97% ethanol. The sampling in each stream was completed within one day, moving upstream to avoid impacting subsequent samples.
Predictors:
Land use intensity
Land-use intensity data for each study site were extracted by Valentin Moser from Geodienste.ch and swisstopo.admin.ch using ARCGIS (Pro v. 2.8, Esri Inc., 2021 und QGIS 3.40 ‘Bratislava’ 2024). A radius of 250 meters was selected around the center of each Pool and Control to avoid overlap between paired sampling areas. A visual inspection of the data showed highly reliable classification for agricultural (e.g., crop fields, pastures, areas that farmers maintain for biodiversity) and natural (e.g., forests, riparian areas) land-use. Areas without classified land-use were very often urban areas and human infrastructure, such as streets, railways, and housing. Therefore, missing data was assigned to urban land-use. V. Moser averaged the sum of agricultural, urban, and natural land-use cover types per site across Pool and Control. Finally, we aggregate agricultural and urban land cover estimates into an overall human impact variable called human_influence.
Stream ecomorphology
The classification of ecomorphology index considered factors such as riparian zone modifications and structural alterations of stream beds to provide an estimate of the degree of anthropogenic impact. The streams included in this study belonged to the first three of the five ecomorphology categories, ranging from near-natural (category level 1) to slightly impacted (2) and heavily impacted (3). The values for ecomorphology were retrieved for each stream from the Swiss Geoportal, specifically the map layer ’Ecomorphology Level F – River reaches‘ (link to access).
Load raw data
Let’s load the covariates/predictors we are interested in:
Merge data frames:
Merge the human influence variable derived by V. Moser to the macrozoobenthos dataframe:
Code
m2 <- m1 |>
left_join(h_unique, by = "site") Now we are going to:
Renames the existing
locationcolumn tolocation_old.Creates a new column
locationwhere the first three categories (inflow,outflow,pool) are all grouped under"beaver", and the fourth category stays as it is.
Plot the standardized human influence by site:
Code
# standardise human_influence (z-score)
h_std <- h %>%
mutate(human_influence_std = scale(human_influence)[,1])
# plot
ggplot(h_std, aes(x = site, y = human_influence_std)) +
geom_point(color = "steelblue", size = 3, shape = 21) +
geom_hline(yintercept = 0)+
labs(
title = "Standardised Human Influence by Site",
x = "Site",
y = "Human Influence (standardised)"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))The sites with above-average human influence (> 0) are: Biber, Chriesbach, Ellikon, Gaebelbach, Gile, Leugene, Rot, Tegel, Weierbach.
Plot
Code
# Alternative: side-by-side histogram (use 'dodge')
ggplot(m3, aes(x = length, fill = location)) +
geom_histogram(binwidth = 1, color = "black", alpha = 0.7, position = "dodge") +
labs(
title = "Histogram of individual sizes by sampled location",
x = "Size (mm)",
y = "Frequency",
fill = "Location"
) +
theme_bw(base_size = 14)Code
# Summarize stats per location
stats <- m3 %>%
group_by(location) %>%
summarise(
n = n(),
mean_size = mean(length, na.rm = TRUE),
sd_size = sd(length, na.rm = TRUE),
median_size = median(length, na.rm = TRUE)
)
# with stat halfeye
ggplot(m3, aes(y = location, x = length, fill = location)) +
stat_halfeye(
position = "dodge",
adjust = 1.1,
width = 0.6,
justification = -0.1,
point_interval = mean_qi,
alpha = 0.7
) +
geom_text(
data = stats,
aes(
x = 20, # fixed at 20 mm
y = location, # map y to location
label = paste0("n=", n, "\nmean=", round(mean_size,1),
"\nsd=", round(sd_size,1),
"\nmedian=", round(median_size,1))
),
inherit.aes = FALSE,
hjust = 0,
size = 3.5
) +
labs(
title = "Size Distributions by Location",
y = "Location",
x = "Size (mm)"
) +
theme_bw(base_size = 14)Save the data to processed folder
From length to dry weight
We are going to estimate arthropod’s body mass as J. Pomeranz using taxon-specific published length- weight regressions (Pomeranz, Junker, and Wesner 2022). Note that ISD relationships are sensitive to the under sampling of small body sizes. (Perkins et al. 2018) sampled benthic macroinvertebrates using comparable methods and found that body sizes smaller than 0.0026 mg were under sampled. Therefore, we will set the minimum body size to 0.0026 mg estimated dry weight before estimating ISD relationships to avoid the under sam- pling of small body sizes.
Load processed data
# A tibble: 0 × 13
# ℹ 13 variables: ...1 <dbl>, laufnummer <dbl>, site <chr>, location_old <chr>,
# order <chr>, family <chr>, taxon_lowest <chr>, length <dbl>,
# ecomorphology <dbl>, area_pool <dbl>, human_influence <dbl>,
# location <chr>, method <chr>
Code
# since there are no difference between the two columns, we remove one of them
m4 <- m3|> select(-taxon_lowest) Extract unique combination of order - family and check how many families are there.
Code
# Load coefficient datasets
coeff_sol <- read_csv(here("data","raw","lw","sohlstrom.csv"))
coeff_pom_raw <- read_csv(here("data","raw","lw","LW_coeffs.csv"))
# Unique family levels
levels_m4 <- unique(m4$family)
levels_sol <- unique(coeff_sol$family)
levels_pom <- unique(coeff_pom_raw$family)
# Total families in m4
total_m4 <- length(levels_m4)
# Count matches
matches_sol <- length(intersect(levels_m4, levels_sol))
matches_pom <- length(intersect(levels_m4, levels_pom))
# Print results
cat("Total number of families in m4:", total_m4, "\n")Total number of families in m4: 77
Code
cat("Number of m4 families present in coeff_sol:", matches_sol, "\n")Number of m4 families present in coeff_sol: 1
Code
cat("Number of m4 families present in coeff_pom_raw:", matches_pom, "\n")Number of m4 families present in coeff_pom_raw: 47
Please refer to this pubblication and this https://github.com/jswesner/get_neon_body_sizes
Terrestrial arthropods: suction
This data was collected by Ph.D student Valentin Moser and Master students in 2021 and 2022 within 16 different streams in Switzerland. They sampled terrestrial arthropods by using a 5 x 1 m plot located one meter from the stream’s edge in the center of the beaver pool and control area. Within each plot, they sampled the arthropods at the two ends of the 5 x 1 m plot in cylindrical baskets (50 cm diameter, 67 cm height, woven fabric) using suction sampling on a sunny day between 10:00-17:00 during peak arthropod activity. The samples were stored in ethanol, individuals were counted, measured and identified to order level with the help of a binocular.
Load raw data
Code
t21 = readxl::read_xlsx(path = here("data", "raw","data_arthropods_terrestrial_2021.xlsx"), sheet = 1) |> mutate(year = 2021) |> rename(laufnummer = Laufnummer) |> select(-c(remarks,adult))
# load site info
site_info <- read_csv(here("data", "raw" , "site_info.csv")) |> select (c(2:5,8,9,19,23))
# check common columns
intersect(names(t21), names(site_info))[1] "laufnummer"
Code
# A tibble: 1 × 15
latitude_sample longitude_sample laufnummer year site sample location
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 47.6 9.18 72 2021 Logge Outflow_3 Outflow
# ℹ 8 more variables: ecomorphology <dbl>, area_pool <dbl>,
# samples_replicate <chr>, class <chr>, order <chr>, suborder <chr>,
# family <chr>, size <dbl>